This example will use the following libraries:
library(dismo)
## Loading required package: raster
## Loading required package: sp
library(sp)
library(raster)
library(maps)
library(stringr)
First I need to define an raster::extent object for a box bounding NH and VT. This will be used to crop the data that we download. I can get a bounding box dynamically using drawExtent(). Click twice (upper corner/lower corner) on the map to select the region of interest. Don’t go too far outside the lines.
maps::map('state', region=c('new hampshire', 'vermont', 'new york', 'massachusetts'))
NHVT <- raster::drawExtent()
or I can use longitude/latitude values for the box and use extent():
NHVT <- raster::extent(-73.61056, -70.60205, 42.48873, 45.37969)
I download the shapefile for the NH and VT state borders using getData() which gives polygons for countries. Level 1 will be the state boundaries (I assume). The shape file has all the states. Then I use subset() to get the two states that I want. path says where to save the downloaded file.
usashp <- raster::getData('GADM', country='USA', level=1, path="data")
nhvtshp <- subset(usashp, NAME_1 %in% c("New Hampshire", "Vermont"))
Check the projection for this shapefile:
crs(nhvtshp)
## CRS arguments:
## +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
I downloaded this shapefile separately and read it in. This will get the boundary of the Hubbard Brook Experimental Forest from a shapefile.
hbshp <- raster::shapefile("data/hbef_boundary/hbef_boundary.shp")
I check its projection and note that it is different from the NH+VT shapefile.
crs(hbshp)
## CRS arguments:
## +proj=utm +zone=19 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs
I transform the shapefile to get it on the same projection.
newcrs <- crs(nhvtshp)
hbshp <- sp::spTransform(hbshp, newcrs)
plot(nhvtshp, border="blue", axes=TRUE)
plot(hbshp, add=TRUE)
text(-71.8, 44, "HBEF", pos=4)
I will download occurrence data for Trillium grandiflorum and Trillium undulatum in my NH+VT bounding box from the Global Biodiversity Information Facility. nrecs seems to be ignored. geo means only points with longitude and latitude. removeZeros means get rid of NA in location. ext is the bounding box to use. The downloaded data has many columns that I don’t need. I will subset the following columns. select in the subset() call says what columns to use.
filePath <- file.path(here::here(), "data/trillium_presences.RData")
if(!file.exists(filePath)){
colsWeNeed <- c("species", "lat", "lon", "locality", "year", "coordinateUncertaintyInMeters", "occurrenceID", "occurrenceRemarks", "geodeticDatum")
grandiflorum <- dismo::gbif("Trillium",
species="grandiflorum",
nrecs=300, geo=TRUE,
removeZeros=TRUE, ext=NHVT)
grandiflorum <- subset(grandiflorum, select=colsWeNeed)
undulatum <- dismo::gbif("Trillium",
species="undulatum",
nrecs=300, geo=TRUE,
removeZeros=TRUE, ext=NHVT)
undulatum <- subset(undulatum, select=colsWeNeed)
trillium.raw <- rbind(grandiflorum, undulatum)
save(trillium.raw, file="data/trillium_presences.RData")
}else{
load("data/trillium_presences.RData")
}
Check the projection to make sure it makes sense and there is only one value. Check that it is the same projection as my other layers.
unique(trillium.raw$geodeticDatum) # "WGS84"
## [1] "WGS84"
trillium is just a data frame. I make it a sp object (specifically a SpatialPointsDataFrame) using sp::coordinates to specify which columns are the longitude and latitude.
trillium <- trillium.raw
sp::coordinates(trillium) <- c("lon", "lat")
Check that it looks ok and there are no NAs.
summary(trillium$lon)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -73.58 -73.04 -72.64 -72.44 -71.89 -70.61
summary(trillium$lat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 42.49 43.67 44.21 44.04 44.48 45.37
table(cut(trillium$coordinateUncertaintyInMeters, c(0,200,500,1000, 2000, 5000)))
##
## (0,200] (200,500] (500,1e+03] (1e+03,2e+03] (2e+03,5e+03]
## 1230 81 39 48 60
I am going to keep only those locations with a location accuracy within 200m.
good <- which(trillium$coordinateUncertaintyInMeters<200)
trillium <- trillium[good,]
Now I can plot the occurences points and add the NH and VT state boundaries. Trillium undulatum is much more common. Hubbard Brook is outlined in blue.
maps::map('state', xlim=NHVT[1:2], ylim=NHVT[3:4])
plot(subset(trillium, species=="Trillium grandiflorum"), pch=3, cex=1, add=TRUE)
plot(subset(trillium, species=="Trillium undulatum"), pch=4, cex=1, col="red", add=TRUE)
plot(hbshp, add=TRUE, border="blue")
I will use the getData function from the raster package to get climate data. I will retrieve global bioclimatic variables at 0.5’ resolution from WorldClim. This returns 19 bioclimatic variables. For the 0.5’ resolution, I need to give it a center longitude and latitude and then it returns like 60 degress longitude and latitude. This is 300+ Mb but it will check if the directory exists and won’t keep re downloading.
bioclimVars <- raster::getData(name="worldclim", #interpolated climate data
res = 0.5, # finest resolution
var = "bio", # the bio dataset
lon = mean(NHVT[1:2]),
lat = mean(NHVT[3:4]),
path = "data")
This is a raster stack.
class(bioclimVars)
A raster stack is collection of many raster layers with the same projection, spatial extent and resolution. I don’t know why _13 is appended to the bioclim names by getData().
raster::extent(bioclimVars) # lons are x and lats are y
## class : Extent
## xmin : -90
## xmax : -60
## ymin : 30
## ymax : 60
names(bioclimVars) # look at the file names you want in wc0.5 folder
## [1] "bio1_13" "bio2_13" "bio3_13" "bio4_13" "bio5_13" "bio6_13"
## [7] "bio7_13" "bio8_13" "bio9_13" "bio10_13" "bio11_13" "bio12_13"
## [13] "bio13_13" "bio14_13" "bio15_13" "bio16_13" "bio17_13" "bio18_13"
## [19] "bio19_13"
I will crop down to the NH+VT bounding box.
NHVTVars <- raster::crop(bioclimVars, NHVT)
Here I will plot just 3 of these variables.
sub.NHVTVars <- subset(NHVTVars, c("bio10_13", "bio18_13", "bio19_13"))
Plot my three variables. This takes awhile. Note bio10_13 is mean temperature times 10. That’s how GBIF records temperature.
plot(sub.NHVTVars)
Another why to make this stack is to read in the data from the downloaded files in dir("wc0.5"). Downloaded in .bil format and comprised of two files.
fils <- dir("data/wc0.5", full.names=TRUE)
fils <- fils[stringr::str_detect(fils, "bil")]
Plot one layer.
onelayer <- raster::raster(fils[1])
plot(onelayer, xlim=NHVT[1:2], ylim=NHVT[3:4])
# Add the NH VT lines on top
plot(nhvtshp, add=TRUE, border="blue")
plot(hbshp, add=TRUE)
title(names(onelayer))
You can create a stack from the file names and then crop to the NH+VT bounding box.
# But for some reason when it reads in the bil file, it is not setting the projection from
# the header file. So I'll use the
NHVTVars <- raster::stack(fils)
NHVTVars <- raster::crop(NHVTVars, NHVT)
I want to make a data frame to use to get the descriptions for the variables.
bioclimnames <- data.frame(
name=paste0("bio", 1:19, "_13"),
desc = c("Mean annual temperature","Mean diurnal range (mean of max temp - min temp)", "Isothermality (bio2/bio7) (* 100)", "Temperature seasonality (standard deviation *100)", "Max temperature of warmest month", "Min temperature of coldest month", "Temperature annual range (bio5-bio6)", "Mean temperature of the wettest quarter", "Mean temperature of driest quarter", "Mean temperature of warmest quarter", "Mean temperature of coldest quarter", "Total (annual) precipitation", "Precipitation of wettest month", "Precipitation of driest month", "Precipitation seasonality (coefficient of variation)", "Precipitation of wettest quarter", "Precipitation of driest quarter", "Precipitation of warmest quarter", "Precipitation of coldest quarter"),
col = c("mean.temp", "temp.diurnal.range", "isotherm", "temp.seasonality", "max.warm.temp", "min.cold.temp", "temp.annual.range", "mean.temp.wet.qtr", "mean.temp.dry.qtr", "mean.temp.warm.qtr", "mean.temp.cold.qtr", "total.precip", "precip.wet.month", "precip.dry.month", "precip.seasonality", "precip.wet.qtr", "precip.dry.qtr", "precip.warm.qtr", "precip.cold.qtr"),
stringsAsFactors=FALSE)
I can read this in with getData(). For USA, this returns a list of 4 raster layers. List 1 is mainland. Then I crop to my NHVT bounding box. I don’t need to re-download the data if I already have it.
dirPath <- "data/elevation"
if(!dir.exists(dirPath)) dir.create(dirPath)
if(!file.exists("data/USAelevation.grd")){
USAelevation <- raster::getData('alt',
country='USA',
path="data/elevation", # where to save
mask=FALSE # don't cut off at border
)
raster::writeRaster(USAelevation[[1]], filename="data/USAelevation.grd", overwrite=TRUE)
}else{
USAelevation <- raster::raster("data/USAelevation.grd")
}
NHVT.elevation <- raster::crop(USAelevation[[1]], NHVT)
These data are downloaded from EarthEnv land cover data set with a function called getlandcover.R in the code folder.
fils <- paste0("data/landcover/lc_1km_", c(1:12, "dom"), ".tif")
# check if the files already exist, and if not, download
if(!all(file.exists(fils)))
getlandcover(gisdir="data/landcover", ext=NHVT)
NHVT.landcover <- raster::stack(fils)
The names are cryptic. Fix that.
oldnames <- paste0("lc_1km_", c(1:12, "dom"))
newnames <- c("Mixed.Needleleaf.Trees", "Evergreen.Broadleaf.Trees", "Deciduous.Broadleaf.Trees", "Mixed.Other.Trees", "Shrubs", "Herbaceous", "Cultivated", "Flooded", "Urban", "Snow", "Barren", "Water", "Dominant.Land.Cover")
names(NHVT.landcover)[match(names(NHVT.landcover), oldnames)] <- newnames
I want to make a layer for trees.
for(i in 1:4){
fil <- paste0("data/landcover/lc_1km_",i,".tif")
r <- raster(fil)
if(i==1) rt <- r else rt <- r+rt
}
NHVT.Trees <- rt
names(NHVT.Trees) <- "Tree.Cover"
Trillium undulatum is associated with tree cover.
plot(NHVT.Trees)
plot(nhvtshp, add=TRUE,border="blue")
plot(hbshp, add=TRUE)
title("Tree Cover")
plot(subset(trillium, species=="Trillium undulatum"), pch=".", cex=2, col="red", add=TRUE)
plot(hbshp, add=TRUE, border="blue")
The terrain() function will return the slope and aspect from an elevation layer.
NHVT.slope <- raster::terrain(NHVT.elevation, opt='slope')
NHVT.aspect <- raster::terrain(NHVT.elevation, opt='aspect')
Plot of aspect.
plot(NHVT.aspect)
plot(nhvtshp, add=TRUE)
There is also a hillShade() function that makes a pretty elevation plot.
hill <- raster::hillShade(NHVT.slope, NHVT.aspect, 40, 270)
plot(hill, col=grey(0:100/100), legend=FALSE, main='NH and VT elevation')
plot(NHVT.elevation, col=rainbow(25, alpha=0.35), add=TRUE)
plot(nhvtshp, add=TRUE)
plot(hbshp, add=TRUE)
First, I will create a stack with all my variables.
allVars <- raster::stack(NHVTVars, NHVT.elevation, NHVT.slope, NHVT.aspect, NHVT.landcover, NHVT.Trees)
The names in allVars are annoying, so I’ll give it better names.
names(allVars)
## [1] "bio1_13" "bio2_13"
## [3] "bio3_13" "bio4_13"
## [5] "bio5_13" "bio6_13"
## [7] "bio7_13" "bio8_13"
## [9] "bio9_13" "bio10_13"
## [11] "bio11_13" "bio12_13"
## [13] "bio13_13" "bio14_13"
## [15] "bio15_13" "bio16_13"
## [17] "bio17_13" "bio18_13"
## [19] "bio19_13" "USA1_alt"
## [21] "slope" "aspect"
## [23] "Mixed.Needleleaf.Trees" "Evergreen.Broadleaf.Trees"
## [25] "Deciduous.Broadleaf.Trees" "Mixed.Other.Trees"
## [27] "Shrubs" "Herbaceous"
## [29] "Cultivated" "Flooded"
## [31] "Urban" "Snow"
## [33] "Barren" "Water"
## [35] "Dominant.Land.Cover" "Tree.Cover"
I will write code to assign the right names. That way I won’t risk giving the wrong names to columns.
# these are the annoying names
oldcols <- c(bioclimnames$name, "USA1_alt")
# better names
newcols <- c(bioclimnames$col, "elevation")
for(i in 1:length(oldcols)) names(allVars)[names(allVars)==oldcols[i]] <- newcols[i]
The temperature returned by GBIF is temperature x 10 so I will fix that.
for(i in names(allVars)[stringr::str_detect(names(allVars), "temp")])
allVars[[i]] <- allVars[[i]]/10
Finally, I will save allVars to a object. This is mainly for R Markdown because it is complaining when I try to plot.
rf <- raster::writeRaster(allVars, filename="data/allVars.grd", overwrite=TRUE)
Plot.
plot(rf)
Next I create a data frame with the values of the variables where Trillium locations are using the extract() function. This takes a raster layer or stack of layers and the point location data (as a spatial points object). I use cellnumbers=TRUE to help identify duplicates.
trillVars <- data.frame(raster::extract(allVars, trillium, cellnumbers=TRUE))
I add on columns for the species name, “presence-absence” and lat/lon.
trillVars$species <- trillium$species
trillVars$pa <- 1
trillVars$lon <- trillium$lon
trillVars$lat <- trillium$lat
Duplicates means multiple occurrences assigned to the same cell. We can find this by seeing how many values in the cells column are duplicates. There are many. Our raster cells are 0.5 x 0.5 degrees (so 1km or so?). Trillium occurrences in that same cell (lat/lon values that are in the same 0.5 square grid) will have the same cell number so will be “duplicates”, meaning an occurrence in a cell where there is already another occurrence.
# this is where both cell and species are the same
dups <- duplicated(cbind(trillVars$cells, trillVars$species))
sum(dups)
## [1] 478
I get rid of them by saying take the non-duplicated cell values.
trillVars <- trillVars[!dups,]
Now I have about half the number of lines of data.
dim(trillVars)
## [1] 750 41
I will add on 5000 random background points by sampling from all the cells in allVars. I want to keep the lon/lat information (cell centers).
background <- data.frame(raster::sampleRandom(allVars, size=5000, cells=TRUE, xy=TRUE))
I need to fix the first three column names since I will be appending this data frame to the one above and that one uses cells as the cell column name. Also add on species name and presence-absence info. I make two copies of the background to make it easier to subset data for each species later.
names(background)[1:3] <- c("cells", "lon", "lat")
background$pa <- 0
background <- rbind(background, background)
background$species <- c(rep("Trillium grandiflorum", 5000), rep("Trillium undulatum", 5000))
In order to bind this data frame to the presence one, I need the column names to also be in the same order. I need to fix that since sampleRandom() put the lat/lon columns in columns 2 and 3. Put background columns in the same order as trillVars. The first line of code is selecting columns from background in the order shown after the ,.
background <- background[,colnames(trillVars)]
identical(colnames(background), colnames(trillVars))
## [1] TRUE
Now I bind this two data frames together for final data frame with occurrences, climatic data, and my background zeros.
dat <- rbind(trillVars, background)
The rownames are annoying so I will set to NULL.
rownames(dat) <- NULL
I save the objects that I will need later except for allVars. R often complains when I try to use allVars by reloading this saved RData file because raster objects are associated with files and this raster object in memory is associated with a temporary file. When I use allVars later, I will load allVars via a call to brick later and I specify the file source explicitly. allVars was saved to a .grd file earlier in the script.
save(trillium, dat, nhvtshp, hbshp, bioclimnames, NHVT, file="data/trillium_with_predictors.RData")